fix: [df52] route timestamp timezone mismatches through spark_parquet_convert #3494

andygrove · 2026-02-11T23:04:42Z

Summary

Fix incorrect timestamp timezone handling in the schema adapter on the df52 branch
INT96 Parquet timestamps coerced to Timestamp(us, None) by DataFusion were being routed through Spark's Cast expression when the logical schema expected Timestamp(us, Some("UTC")). Spark's Cast treats None-timezone as TimestampNTZ (local time) and applies a timezone conversion, shifting values by the session timezone offset (e.g., -5h45m for Asia/Kathmandu)
Route Timestamp -> Timestamp mismatches through CometCastColumnExpr which delegates to spark_parquet_convert, handling this as a metadata-only timezone relabel

Test plan

Existing test "SortMergeJoin with unsupported key type should fall back to Spark" in CometJoinSuite should now pass
Existing Rust tests (parquet_roundtrip_int_as_string, parquet_roundtrip_unsigned_int) continue to pass
CI passes

🤖 Generated with Claude Code

…_convert INT96 Parquet timestamps are coerced to Timestamp(us, None) by DataFusion but the logical schema expects Timestamp(us, Some("UTC")). The schema adapter was routing this mismatch through Spark's Cast expression, which incorrectly treats None-timezone values as TimestampNTZ (local time) and applies a timezone conversion. This caused results to be shifted by the session timezone offset (e.g., -5h45m for Asia/Kathmandu). Route Timestamp->Timestamp mismatches through CometCastColumnExpr which delegates to spark_parquet_convert, handling this as a metadata-only timezone relabel without modifying the underlying values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove · 2026-02-12T13:49:54Z

CI Comparison: PR #3494 vs df52 baseline (PR #3470)

	PR #3470 (df52 baseline)	PR #3494 (with this fix)
Failing CI jobs	34	30
Passing CI jobs	same set + 4 extra failures	same set

Jobs fixed by this PR (fail in #3470, pass in #3494)

spark-sql-native_datafusion-sql_core-3/spark-3.5.8
spark-sql-native_datafusion-sql_hive-2/spark-3.5.8
ubuntu-latest/Spark 3.5, JDK 17, Scala 2.12/native_datafusion [exec]
ubuntu-latest/Spark 3.5, JDK 17, Scala 2.12/native_datafusion [sql]

Regressions introduced by this PR (pass in #3470, fail in #3494)

None — every failure in #3494 also exists in #3470.

Summary

This PR is a strict improvement over the df52 baseline: 4 fewer failing CI jobs with zero regressions.

Note

This comment was generated with the assistance of AI (Claude Code) and should be verified independently.

andygrove marked this pull request as ready for review February 11, 2026 23:06

andygrove merged commit f0652aa into apache:df52 Feb 12, 2026
81 of 111 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [df52] route timestamp timezone mismatches through spark_parquet_convert #3494

fix: [df52] route timestamp timezone mismatches through spark_parquet_convert #3494

andygrove commented Feb 11, 2026

Uh oh!

andygrove commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: [df52] route timestamp timezone mismatches through spark_parquet_convert #3494

fix: [df52] route timestamp timezone mismatches through spark_parquet_convert #3494

Conversation

andygrove commented Feb 11, 2026

Summary

Test plan

Uh oh!

andygrove commented Feb 12, 2026

CI Comparison: PR #3494 vs df52 baseline (PR #3470)

Jobs fixed by this PR (fail in #3470, pass in #3494)

Regressions introduced by this PR (pass in #3470, fail in #3494)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant